Dataset

Descripción del dataset

El conjunto de datos ha sido tomado de [data.world] (https://data.world/martinchek/2012-2016-facebook-posts) y pertenecen a un [artículo] (https://shift.newco.co/2016/11/09/What-I-Discovered-About-Trump-and-Clinton-From-Analyzing-4-Million-Facebook-Posts/) que analizó 4 millones de publicaciones de Facebook para observar que información había sobre Donald Trump y Hilary Clinton.

Por ello, el conjunto contiene publicaciones de Facebook escritos entre 2012 y 2016 por 15 de los principales medios de comunicación.

Los datos se encuentran agrupados (en csv) por medio de comunicación, por lo que para este caso solo he escogido 5 de esos medios para reducir los costes que puede tener analizar un conjunto de datos tan grande.

Por tanto, el dataset generado se trata de un conjunto de 154.173 filas, cada una de las cuales corresponde a una publicación de Facebook realizada por algunas de las siguientes páginas:

  • BBC (id 228735667216): The British Broadcasting Corporation.

  • CNN (id 5550296508): Multinational cable news channel headquartered in Atlanta, Georgia, U.S.

  • Fox (id 15704546335): The Fox News Channel.

  • ABC (id 86680728811): ABC News is the news division of the American broadcast network ABC.

  • LA Times (id 5863113009): Los Angeles Times: News from California, the nation and world.

Previsualización de los datos

  X                             id      page_id
1 1 228735667216_10151160265682217 228735667216
2 2 228735667216_10151161802472217 228735667216
3 3 228735667216_10151162236982217 228735667216
4 6 228735667216_10151799077647217 228735667216
5 7 228735667216_10151948038612217 228735667216
6 8 228735667216_10151948251032217 228735667216
                                                    name
1                                        BBC News Photos
2 US congress in final push to reach 'fiscal cliff' deal
3                                        BBC News Photos
4                                        Timeline Photos
5                                        Timeline Photos
6                                        Timeline Photos
                                                                                                                                                                                                                                                                                                                                   message
1                                                                                                                                                                                                                                                                             Your best photographs of 2012. GALLERY: http://bbc.in/UvZxmS
2                                                           London's leading shares have fallen amid fears that budget talks won't stop the US sliding over the 'fiscal cliff'.  http://bbc.in/Uz888cAre you worried about the impact to the global economy, if US senators don't reach a deal before New Year's Day? http://bbc.in/Uz86x7
3                                                                                                                                                                                                  In Pictures: New Year's celebrations from around the world, some dazzling, others poignant and one that's a first http://bbc.in/130CWBE
4 Do you think your life will get better or worse, or stay the same, in 2014?  http://bbc.in/KeY2bO Nearly 50% of people believe things will improve for them in the coming year, according to a global survey conducted annually since 1977.How far do you agree with the findings?  And how would you describe your year, in a nutshell?
5        China's President Xi Jinping is on his way to the Netherlands with a 200-strong business delegation on his first trip to Europe as leader http://bbc.in/1ra3ptNHis tour is expected to be dominated by trade, but Mr Xi is also likely to face pressure from Western powers to be firmer with Russia over its actions in Ukraine.
6                                                                                                                                                                                                 The letter Malaysia's acting Transport Minister Hishammuddin Hussein was handed during the #MH370 press conference http://bbc.in/1oJICLr
   description post_type     status_type likes_count comments_count
1 not provided     photo    added_photos         242              6
2                   link published_story          39             56
3 not provided     photo    added_photos         545             55
4 not provided     photo    added_photos         824            470
5 not provided     photo    added_photos         601            109
6 not provided     photo    added_photos        2230            247
  shares_count love_count wow_count haha_count sad_count thankful_count
1           45          0         0          0         0              0
2            0          0         0          0         0              0
3          133          0         0          0         0              0
4          163          0         0          0         0              0
5           61          0         0          0         0              0
6          352          0         0          0         0              0
  angry_count  posted_at       date     hour page_name
1           0 2012-12-30 2012-12-30 09:16:36       bbc
2           0 2012-12-31 2012-12-31 14:16:43       bbc
3           0 2012-12-31 2012-12-31 20:13:56       bbc
4           0 2013-12-30 2013-12-30 09:00:50       bbc
5           0 2014-03-22 2014-03-22 06:17:57       bbc
6           0 2014-03-22 2014-03-22 11:18:40       bbc

Resumen de los campos

       X               id               page_id                 name          
 Min.   :     1   Length:135544      Min.   :  5550296508   Length:135544     
 1st Qu.: 38929   Class :character   1st Qu.:  5863113009   Class :character  
 Median : 75102   Mode  :character   Median : 15704546335   Mode  :character  
 Mean   : 75797                      Mean   : 60085410520                     
 3rd Qu.:113203                      3rd Qu.: 86680728811                     
 Max.   :150983                      Max.   :228735667216                     
   message          description         post_type         status_type       
 Length:135544      Length:135544      Length:135544      Length:135544     
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
  likes_count      comments_count      shares_count       love_count      
 Min.   :      1   Min.   :     0.0   Min.   :      0   Min.   :     0.0  
 1st Qu.:    568   1st Qu.:    82.0   1st Qu.:     88   1st Qu.:     0.0  
 Median :   1745   Median :   226.0   Median :    282   Median :     0.0  
 Mean   :   5663   Mean   :   795.6   Mean   :   1573   Mean   :   132.5  
 3rd Qu.:   4678   3rd Qu.:   673.0   3rd Qu.:    972   3rd Qu.:     9.0  
 Max.   :1155249   Max.   :770366.0   Max.   :1934157   Max.   :213998.0  
   wow_count          haha_count         sad_count      thankful_count     
 Min.   :    0.00   Min.   :    0.00   Min.   :     0   Min.   :   0.0000  
 1st Qu.:    0.00   1st Qu.:    0.00   1st Qu.:     0   1st Qu.:   0.0000  
 Median :    0.00   Median :    0.00   Median :     0   Median :   0.0000  
 Mean   :   58.12   Mean   :   63.62   Mean   :   105   Mean   :   0.1321  
 3rd Qu.:   12.00   3rd Qu.:    4.00   3rd Qu.:     3   3rd Qu.:   0.0000  
 Max.   :45708.00   Max.   :45122.00   Max.   :149098   Max.   :1541.0000  
  angry_count         posted_at             date               hour          
 Min.   :     0.00   Length:135544      Length:135544      Length:135544     
 1st Qu.:     0.00   Class :character   Class :character   Class :character  
 Median :     0.00   Mode  :character   Mode  :character   Mode  :character  
 Mean   :    90.87                                                           
 3rd Qu.:     3.00                                                           
 Max.   :117430.00                                                           
  page_name        
 Length:135544     
 Class :character  
 Mode  :character  
                   
                   
                   

General

Row

Número de Publicaciones

135544

Número de páginas

5

Número de likes

767605124

Row

Número de enlaces

21

Número de vídeos

21

Número de canciones

21

Row

Número de eventos

21

Número de fotos

22

Número de comentarios

21

Row

Número de publicaciones según el tipo

Número de publicaciones por página y tipo

Row

Agrupación por año

Visión global

Row

Número de publicaciones por página y año

Reacciones

Row

Reacciones BBC

Reacciones CNN

Reacciones ABC

Row

Reacciones The Fox

Reacciones LA Times

Row

Donut chart: Reacciones BBC

Donut chart: Reacciones CNN

Donut chart: Reacciones BBC

Row

Donut chart: Reacciones The Fox

Donut chart: Reacciones LA Times

Análisis de textos y sentimientos

Row

Basketball

Filtered word: Basketball

Image

Filtered word: Image

Row

Normal

Wordcloud normal

Color

Wordcloud cambiando el color

Forma

Wordcloud cambiando la forma

Row

Análisis de sentimientos (AFINN)

Row

Red de correlaciones para el filtro Basketball

---
title: "HR Movement dashboard"
output: 
  flexdashboard::flex_dashboard:
    orientation: rows
    vertical_layout: fill
    source_code: embed
---


```{r setup, include=FALSE}
# Dashboard
library(flexdashboard)
# Data manipulation
library(tidyverse)      # data manipulation & plotting
library(stringr)        # text cleaning and regular expressions
library(tidytext)       # provides additional text mining functions
library(textdata)
library(dplyr)
# Plots
library(viridis)
library(wordcloud2)
#devtools::install_github("gaospecial/wordcloud2")
#library("gaospecial/wordcloud2")
library(RColorBrewer)
library(tm)
library(ggplot2)
library(emojifont)
library(plotly)
library(gridExtra)
# PieChart
library(lessR)
# Network
library(igraph)
library(ggraph)
library(ggiraph)
#devtools::install_github("dgrtwo/drlib")


```


```{r}
data <- read.csv("facebook_dataset.csv", header=TRUE, stringsAsFactors = FALSE)
data$page_name<-""
data[data$page_id=="86680728811",]$page_name<-"abc"
data[data$page_id=="228735667216",]$page_name<-"bbc"
data[data$page_id=="5550296508",]$page_name<-"cnn"
data[data$page_id=="15704546335",]$page_name<-"fox"
data[data$page_id=="5863113009",]$page_name<-"laTimes"
```


Dataset {data-icon="fa-table"}
=============================

### Descripción del dataset

El conjunto de datos ha sido tomado de [data.world] (https://data.world/martinchek/2012-2016-facebook-posts) y pertenecen a un [artículo] (https://shift.newco.co/2016/11/09/What-I-Discovered-About-Trump-and-Clinton-From-Analyzing-4-Million-Facebook-Posts/) que analizó 4 millones de publicaciones de Facebook para observar que información había sobre Donald Trump y Hilary Clinton. 

Por ello, el conjunto contiene publicaciones de Facebook escritos entre 2012 y 2016 por 15 de los principales medios de comunicación.

Los datos se encuentran agrupados (en csv) por medio de comunicación, por lo que para este caso solo he escogido 5 de esos medios para reducir los costes que puede tener analizar un conjunto de datos tan grande.


Por tanto, el dataset generado se trata de un conjunto de 154.173 filas, cada una de las cuales corresponde a una **publicación de Facebook** realizada por algunas de las siguientes páginas:

- **BBC** (id 228735667216): The British Broadcasting Corporation.

- **CNN** (id 5550296508): Multinational cable news channel headquartered in Atlanta, Georgia, 
U.S.

- **Fox** (id 15704546335): The Fox News Channel.

- **ABC** (id 86680728811): ABC News is the news division of the American broadcast network ABC.

- **LA Times** (id 5863113009): Los Angeles Times: News from California, the nation and world.

### Previsualización de los datos

```{r}
head(data)
```
### Resumen de los campos

```{r}
summary(data)

```


General {data-icon="fa-chart-line"}
=============================

Row {data-width=150}
--------------------------------------
### Número de Publicaciones
```{r}
total_post <- nrow(data)

valueBox(value = total_post,icon = "fa-facebook",caption = "Número de Publicaciones",color = "#C1FFC1")
```

### Número de páginas
```{r}
total_pages <- length(unique(data$page_id))

valueBox(value = total_pages,icon = "fa-file",caption = "Número de páginas", color = "#FFD700")
```

### Número de likes
```{r}
likes_total <- sum(data$likes_count)

valueBox(value = likes_total,icon = "fa-thumbs-up",caption = "Número de likes", color = "#FFEC8B")
```


Row {data-width=150}
--------------------------------------
### Número de enlaces

```{r}
link_post <- length(data[data$post_type == "link",])

valueBox(value = link_post,icon = "fa-link",caption = "Número de enlaces",color = "#EEEEE0")
```

### Número de vídeos
```{r}
video_post <- length(data[data$post_type == "video",])

valueBox(value = video_post,icon = "fa-video",caption = "Número de vídeos", color = "#BF3EFF")
```

### Número de canciones
```{r}
music_post <- length(data[data$post_type == "music",])

valueBox(value = music_post,icon = "fa-music",caption = "Número de canciones", color = "#7FFFD4")
```


Row {data-width=150}
--------------------------------------
### Número de eventos

```{r}
event_post <- length(data[data$post_type == "event",])

valueBox(value = event_post,icon = "fa-calendar",caption = "Número de eventos",color = "#FFA07A")
```


### Número de fotos
```{r}
photo_post <- length(data[data$post_type == "photo",])

valueBox(value = 22,icon = "fa-image",caption = "Número de fotos", color = "#FF3030")
```

### Número de comentarios
```{r}
comments_total <- sum(data$comments_count)

valueBox(value = music_post,icon = "fa-comment",caption = "Número de comentarios", color = "#E0FFFF")
```



Row {data-height=500}
----------------------------------

<div style="height:500px">

### Número de publicaciones según el tipo

```{r}
data_post_type <- data %>% dplyr::group_by(post_type)

plot <- ggplot(data = data_post_type, aes(x = post_type, fill=post_type)) +
  geom_bar() +
  labs(x = "Tipo de publicación", y = "Nº publicaciones")
ggplotly(plot)
```

</div>


### Número de publicaciones por página y tipo

```{r}
data_by_page_type <- data %>% dplyr::group_by(page_name, post_type) %>%
  dplyr::summarise(total = n()) 

# Small multiple
plot <- ggplot(data_by_page_type, aes(fill=post_type, y=total, x=page_name)) + 
    geom_bar(position="stack", stat="identity") +
    scale_fill_viridis(discrete = T) +
    xlab("Página") +
  ylab("Nº publicaciones")
ggplotly(plot)
```


Row 
----------------------------------

<div style="height:500px">
### Agrupación por año 

```{r}
data_month <- data %>% dplyr::group_by(lubridate::month(date),lubridate::year(date)) %>%
  dplyr::summarise(total = n()) %>% dplyr::arrange(desc(total))
colnames(data_month) <- c("month", "year", "total")

data_month$month <- as.character(data_month$month)
data_month$year <- as.character(data_month$year)
data_month$month <- factor(data_month$month, levels = c("1","2","3","4","5","6", "7","8","9", "10", "11", "12"), labels = c("01","02","03","04","05","06", "07","08","09", "10", "11", "12"))
data_month <- data_month[order(data_month$month),]


plot <- ggplot(data=data_month, mapping=aes(x=month, y=total, shape=year, color=year)) +
  geom_point() +
  geom_line(aes(group = 1)) + 
  facet_grid(facets = year ~ ., margins = FALSE) + theme_bw() + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), plot.title = element_text(hjust = 0.5)) +
  ggtitle("Número de publicaciones por mes y año") +
  xlab("Mes") +
  ylab("Nº publicaciones")

ggplotly(plot)
```

</div>

<div style="height:500px">
### Visión global

```{r}
plot <- ggplot(data=data_month, mapping=aes(x=month, y=total, shape=year, color=year)) +
  geom_point() +
  geom_line(aes(group = 1))+ 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), plot.title = element_text(hjust = 0.5))+  
  ggtitle("Número de publicaciones por mes y año") + 
  xlab("Mes") + 
  ylab("Nº publicaciones")

ggplotly(plot)
```
</div>

Row 
----------------------------------

<div style="height:600px">

### Número de publicaciones por página y año 

```{r}
data_by_page <- data %>% dplyr::group_by(page_name, lubridate::year(date)) %>%
  dplyr::summarise(total = n()) %>% dplyr::arrange(desc(total))

colnames(data_by_page) <- c("page", "year", "total")
data_by_page <- data_by_page[!is.na(data_by_page$page),]
data_by_page$page <- as.character(data_by_page$page)

plot <- ggplot(data_by_page, aes(year, page)) + 
  geom_point(aes(size=total), colour = "red") +
  scale_color_manual(values=c("black", "dodgerblue")) +
  ggtitle("Publicaciones por página y año") +
  theme(legend.position = 'none', plot.title = element_text(hjust = 0.5)) +
  xlab("Año") + 
  ylab("Página")

ggplotly(plot)
```

</div>


Reacciones {data-icon="fa-comments"}
===

Row 
-----------------------------------------------------------------------


```{r}
emojis <- c(#emoji("email"),# share
  #emoji("thumbsup"),# like
  emoji("heart"),# love
  emoji("joy"),# haha
  emoji("cry"),# sad
  emoji("rage"),# angry
  emoji("innocent")# thankfull
  )

#names <- c("shares","likes","love","haha","sad","angry", "thankfull")
names <- c("love","haha","sad","angry", "thankfull")

```



<div style="height:400px">

### Reacciones BBC

```{r}
data_bbc <- tibble(names = names,
       emoji = emojis,
       total = c(#sum(data$shares_count[data$page_name=="bbc"]),
                 #sum(data$likes_count[data$page_name=="bbc"]),
                 sum(data$love_count[data$page_name=="bbc"]),
                 sum(data$haha_count[data$page_name=="bbc"]),
                 sum(data$sad_count[data$page_name=="bbc"]),
                 sum(data$angry_count[data$page_name=="bbc"]),
                 sum(data$thankful_count[data$page_name=="bbc"])
                 ))

plot <- ggplot(data_bbc, aes(names, total, label = emoji, fill=names)) + 
  geom_bar(stat = "identity") +
  geom_text(family = "EmojiOne", size = 6, vjust = -.5) +
  scale_x_discrete(breaks = data_bbc$names, labels = data_bbc$emoji) +
  ggtitle("Reacciones a las publicaciones de BBC") +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
    axis.ticks.x=element_blank()) +
  xlab("Reacción")

ggplotly(plot)
```

</div>

### Reacciones CNN

```{r}
data_cnn <- tibble(names = names,
       emoji = emojis,
       total = c(#sum(data$shares_count[data$page_name=="cnn"]),
                 #sum(data$likes_count[data$page_name=="cnn"]),
                 sum(data$love_count[data$page_name=="cnn"]),
                 sum(data$haha_count[data$page_name=="cnn"]),
                 sum(data$sad_count[data$page_name=="cnn"]),
                 sum(data$angry_count[data$page_name=="cnn"]),
                 sum(data$thankful_count[data$page_name=="cnn"])
                 ))

plot <- ggplot(data_cnn, aes(names, total, label = emoji, fill=names)) + 
  geom_bar(stat = "identity") +
  geom_text(family = "EmojiOne", size = 6, vjust = -.5) +
  scale_x_discrete(breaks = data_cnn$names, labels = data_cnn$emoji) +
  ggtitle("Reacciones a las publicaciones de CNN") +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
    axis.ticks.x=element_blank()) +
  xlab("Reacción")

ggplotly(plot)

```


### Reacciones ABC

```{r}
data_abc <- tibble(names = names,
       emoji = emojis,
       total = c(#sum(data$shares_count[data$page_name=="bbc"]),
                 #sum(data$likes_count[data$page_name=="bbc"]),
                 sum(data$love_count[data$page_name=="abc"]),
                 sum(data$haha_count[data$page_name=="abc"]),
                 sum(data$sad_count[data$page_name=="abc"]),
                 sum(data$angry_count[data$page_name=="abc"]),
                 sum(data$thankful_count[data$page_name=="abc"])
                 ))

plot <- ggplot(data_abc, aes(names, total, label = emoji, fill=names)) + 
  geom_bar(stat = "identity") +
  geom_text(family = "EmojiOne", size = 6, vjust = -.5) +
  scale_x_discrete(breaks = data_abc$names, labels = data_abc$emoji) +
  ggtitle("Reacciones a las publicaciones de ABC") +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
    axis.ticks.x=element_blank()) +
  xlab("Reacción")

ggplotly(plot)
```

Row 
-----------------------------------------------------------------------

<div style="height:400px">


### Reacciones The Fox

```{r}
data_fox <- tibble(names = names,
       emoji = emojis,
       total = c(#sum(data$shares_count[data$page_name=="fox"]),
                 #sum(data$likes_count[data$page_name=="fox"]),
                 sum(data$love_count[data$page_name=="fox"]),
                 sum(data$haha_count[data$page_name=="fox"]),
                 sum(data$sad_count[data$page_name=="fox"]),
                 sum(data$angry_count[data$page_name=="fox"]),
                 sum(data$thankful_count[data$page_name=="fox"])
                 ))

plot <- ggplot(data_fox, aes(names, total, label = emoji, fill=names)) + 
  geom_bar(stat = "identity") +
  geom_text(family = "EmojiOne", size = 6, vjust = -.5) +
  scale_x_discrete(breaks = data_fox$names, labels = data_fox$emoji) +
  ggtitle("Reacciones a las publicaciones de The Fox News Channel") +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
    axis.ticks.x=element_blank()) +
  xlab("Reacción")

ggplotly(plot)
```

</div>

### Reacciones LA Times

```{r}
data_laTimes <- tibble(names = names,
       emoji = emojis,
       total = c(#sum(data$shares_count[data$page_name=="laTimes"]),
                 #sum(data$likes_count[data$page_name=="laTimes"]),
                 sum(data$love_count[data$page_name=="laTimes"]),
                 sum(data$haha_count[data$page_name=="laTimes"]),
                 sum(data$sad_count[data$page_name=="laTimes"]),
                 sum(data$angry_count[data$page_name=="laTimes"]),
                 sum(data$thankful_count[data$page_name=="laTimes"])
                 ))

plot <- ggplot(data_laTimes, aes(names, total, label = emoji, fill=names)) + 
  geom_bar(stat = "identity") +
  geom_text(family = "EmojiOne", size = 6, vjust = -.5) +
  scale_x_discrete(breaks = data_laTimes$names, labels = data_laTimes$emoji) +
  ggtitle("Reacciones a las publicaciones de Los Angeles Times") +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
    axis.ticks.x=element_blank()) +
  xlab("Reacción")

ggplotly(plot)
```


Row 
-----------------------------------------------------------------------

<div style="height:400px">


### Donut chart: Reacciones BBC

```{r, message=FALSE, warning=FALSE,  results='hide'}

lessR::PieChart(x=names, 
                y =total, 
                data = data_bbc,
                fill = "viridis", 
                hole = 0.5, 
                main = "BBC",
                values_size=2,
                labels_cex=2,
                main_cex=2)
```

</div>


### Donut chart: Reacciones CNN

```{r, message=FALSE, warning=FALSE,  results='hide'}

lessR::PieChart(x=names, 
                y =total, 
                data = data_cnn,
                fill = "viridis", 
                hole = 0.5, 
                main = "CNN",
                values_size=2,
                labels_cex=2,
                main_cex=2)
```

### Donut chart: Reacciones BBC

```{r, message=FALSE, warning=FALSE,  results='hide'}

lessR::PieChart(x=names, 
                y =total, 
                data = data_bbc,
                fill = "viridis", 
                hole = 0.5, 
                main = "BBC",
                values_size=2,
                labels_cex=2,
                main_cex=2)
```


Row 
-----------------------------------------------------------------------

<div style="height:400px">

### Donut chart: Reacciones The Fox


```{r, message=FALSE, warning=FALSE,  results='hide'}

lessR::PieChart(x=names, 
                y =total, 
                data = data_fox,
                fill = "viridis", 
                hole = 0.5, 
                main = "The Fox",
                values_size=2,
                labels_cex=2,
                main_cex=2)
```

</div>

### Donut chart: Reacciones LA Times


```{r, message=FALSE, warning=FALSE,  results='hide'}

lessR::PieChart(x=names, 
                y =total, 
                data = data_laTimes,
                fill = "viridis", 
                hole = 0.5, 
                main = "LA Times",
                values_size=2,
                labels_cex=2,
                main_cex=2)

```


Análisis de textos y sentimientos {data-icon="fa-user"}
===

Row {.tabset .tabset-fade}
--------


```{r}
get_bigram_filtered <- function(filter_word){
  
  filtered_data <- data %>% dplyr::filter(stringr::str_detect(tolower(message), filter_word))
  filtered_data <- filtered_data[filtered_data$post_type!="link" &    filtered_data$post_type!="photo" ,]
  filtered_data <- filtered_data[, c('page_name', 'name', 'message')]
  
  data_bigram <- filtered_data %>% unnest_tokens(bigram, message, token = "ngrams", n = 2)
  
  data_bigram <- data_bigram %>%
    separate(bigram, c("word1", "word2"), sep = " ") %>%
    filter(!word1 %in% stop_words$word,
           !word2 %in% stop_words$word) %>%
    count(page_name, word1, word2, sort = TRUE) %>%
    unite("bigram", c(word1, word2), sep = " ") %>%
    dplyr::filter(!stringr::str_detect(tolower(bigram), "http")) %>%
    dplyr::filter(!stringr::str_detect(tolower(bigram), "bbc")) %>%
    dplyr::filter(!stringr::str_detect(tolower(bigram), "fox"))

  data_bigram <- by(data_bigram, data_bigram["page_name"], head, n=8)
  data_bigram <- Reduce(rbind, data_bigram)
  data_bigram
}
```

<div style="height:600px">

### Basketball

```{r}
data_bigram_sport <- get_bigram_filtered("basketball")
```


#### Filtered word: Basketball

```{r}
plot1 <- data_bigram_sport %>% 
  mutate(page_name = factor(page_name) %>% forcats::fct_rev()) %>%
  ggplot(aes(drlib::reorder_within(bigram, n, page_name), n, fill = page_name)) +
  geom_bar(stat = "identity", alpha = .8, show.legend = FALSE) +
  drlib::scale_x_reordered() +
  facet_wrap(~ page_name, ncol = 2, scales = "free") +
  coord_flip() +
  ylab("") +
  xlab("") +
  ggtitle("Pares de palabras con mayor ocurrencia por página filtrando por: Basketbal") +
  theme(legend.position = 'none', plot.title = element_text(hjust = 0.5))


ggplotly(plot1) 
```

</div>


<div style="height:600px">

### Image

```{r}
data_bigram_image <- get_bigram_filtered("image")
```

#### Filtered word: Image


```{r}
plot1 <- data_bigram_image %>% 
  mutate(page_name = factor(page_name) %>% forcats::fct_rev()) %>%
  ggplot(aes(drlib::reorder_within(bigram, n, page_name), n, fill = page_name)) +
  geom_bar(stat = "identity", alpha = .8, show.legend = FALSE) +
  drlib::scale_x_reordered() +
  facet_wrap(~ page_name, ncol = 2, scales = "free") +
  coord_flip() +
  ylab("") +
  xlab("") +
  ggtitle("Pares de palabras con mayor ocurrencia por página filtrando por: Image") +
  theme(legend.position = 'none', plot.title = element_text(hjust = 0.5))
  

ggplotly(plot1) 
```

</div>


Row {.tabset .tabset-fade}
--------

```{r}
get_wordcloud_data <- function(filter_word){
  
  filtered_data <- data %>% dplyr::filter(stringr::str_detect(message, filter_word))
  text <- filtered_data$message
  docs <- tm::Corpus(VectorSource(text))
  
  docs <- docs %>%
    tm::tm_map(tm::removeNumbers) %>%
    tm::tm_map(tm::removePunctuation) %>%
    tm::tm_map(tm::stripWhitespace)
  docs <- tm::tm_map(docs, tm::content_transformer(tolower))
  docs <- tm::tm_map(docs, tm::removeWords, tm::stopwords("english"))
  
  dtm <- tm::TermDocumentMatrix(docs) 
  matrix <- as.matrix(dtm) 
  words <- sort(rowSums(matrix),decreasing=TRUE) 
  df <- data.frame(word = names(words),freq=words)
  df <- df %>% mutate_at(vars(word), function(x){gsub('[^ -~]', '', x)})
  df <- df[1:500, ]
}
```

```{r}
data_wordcloud <- get_wordcloud_data("sport")
```

<div style="height:600px">

### Normal


#### Wordcloud normal


```{r message=FALSE}
wordcloud2(data=data_wordcloud, size=1.6, color='random-dark')
```

</div>


<div style="height:600px">

### Color


#### Wordcloud cambiando el color


```{r message=FALSE}
wordcloud2(data=data_wordcloud, size=1.6, color='random-light', backgroundColor="black")
```

</div >

<div style="height:600px; display: flex; justify-content: center;">

### Forma


#### Wordcloud cambiando la forma


```{r message=FALSE}
wordcloud2(data_wordcloud, size = 0.7, shape = 'star')
```

</div>

Row
------

<div style="height:600px">

### Análisis de sentimientos (AFINN)

```{r}
data_bigram_all <- data[, c('page_name', 'name', 'message')]
data_bigram_all <- data_bigram_all %>% unnest_tokens(bigram, message, token = "ngrams", n = 2)

AFINN <- get_sentiments("afinn")

nots <- data_bigram_all %>%
        separate(bigram, c("word1", "word2"), sep = " ") %>%
        filter(word1 == "not")  %>%
        inner_join(AFINN, by = c(word2 = "word")) %>% 
        count(word2, value, sort = TRUE) 

plot <- nots %>%
        mutate(contribution = n * value) %>%
        arrange(desc(abs(contribution))) %>%
        head(20) %>%
        ggplot(aes(reorder(word2, contribution), n * value, fill = n * value > 0)) +
        geom_bar(stat = "identity", show.legend = FALSE) +
        xlab("Palabras precedidas por 'not'") +
        ylab("Sentimiento según el número de ocurrencia") +
        coord_flip() +
        theme(legend.position = 'none')
ggplotly(plot)

```

</div>

Row
-----

<div style="height:800px">

### Red de correlaciones para el filtro Basketball


```{r}

filtered_data <- data %>% dplyr::filter(stringr::str_detect(tolower(message), "trump"))
filtered_data <- filtered_data[, c('page_name', 'name', 'message')]

data_bigram <- filtered_data %>% unnest_tokens(bigram, message, token = "ngrams", n = 2)
data_bigram <- data_bigram %>%
  dplyr::filter(!stringr::str_detect(tolower(bigram), "http")) %>%
  dplyr::filter(!stringr::str_detect(tolower(bigram), "bbc")) %>%
  dplyr::filter(!stringr::str_detect(tolower(bigram), "fox"))

bigram_graph <- data_bigram %>%
        separate(bigram, c("word1", "word2"), sep = " ") %>%
        filter(!word1 %in% stop_words$word,
               !word2 %in% stop_words$word) %>%
        count(word1, word2, sort = TRUE) %>%
        unite("bigram", c(word1, word2), sep = " ") %>%
        filter(n > 50) %>%
        graph_from_data_frame()
```

```{r}
set.seed(123)

a <- grid::arrow(type = "closed", length = unit(.15, "inches"))

plot <- ggraph(bigram_graph, layout = "fr") +
        geom_edge_link() +
        geom_node_point(color = vcount(bigram_graph) , size = 5) +
        geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
        theme_void()

girafe(ggobj = plot, width_svg = 10, height_svg = 10,
       options = list(opts_sizing(rescale = TRUE, width = .6)))

```


</div>